Lesson 3 Tidy data
… in which we explore the concept of Tidy Data and learn more advanced data wrangling techniques
3.2 Tidy data
3.2.1 What and why is tidy data?

Figure from https://r4ds.had.co.nz/tidy-data.html Wickham and Grolemund10
palmerpenguins::penguins3.2.2 Make data tidy
with the tidyr package.
“Happy families are all alike; every unhappy family is unhappy in its own way”
— Leo Tolstoy (https://tidyr.tidyverse.org/articles/tidy-data.html)
Let’s make some data tidy!
table1
table1
3.2.5 pivot_longer
table4a %>%
pivot_longer(-country, names_to = "year", values_to = "cases")table4a
table4b %>%
pivot_longer(-country, names_to = "year", values_to = "population")
clean_wide_data <- function(data, values_column) {
data %>%
pivot_longer(-country, names_to = "year", values_to = values_column)
}
clean4a <- table4a %>%
clean_wide_data("cases")
clean4b <- table4b %>%
clean_wide_data("population")
3.2.7 unite
table5
table5 %>%
unite("year", century, year, sep = "") %>%
separate(rate, c("cases", "population")) %>%
mutate(
year = parse_number(year),
cases = parse_number(cases),
population = parse_number(population)
)
table5 %>%
unite("year", century, year, sep = "") %>%
separate(rate, c("cases", "population")) %>%
mutate(
across(c(year, cases, population), parse_number)
)
table5 %>%
unite("year", century, year, sep = "") %>%
separate(rate, c("cases", "population")) %>%
mutate(
across(-country, parse_number)
)3.2.8 Another example
billboard- explicit vs implicit
NAs na.omit
billboard %>%
pivot_longer(starts_with("wk"), names_to = "week", values_to = "placement") %>%
mutate(week = parse_number(week))
tidy_bilboard <- billboard %>%
pivot_longer(starts_with("wk"),
names_to = "week",
values_to = "placement",
names_prefix = "wk",
names_transform = list(week = as.integer)
)3.3 More shapes for data
- omitted:
- matrices
- arrays
3.5 Resources
- tidyr documentation
- purrr documentation
- stringr documentation for working with text and a helpful cheatsheet for the regular expressions mentioned in the video
